过渡光谱是一种有力的工具,可以解码额外行星气氛的化学成分。在本文中,我们专注于分析来自过渡外部的光谱数据的无监督技术。我们展示了i)的方法,清洁和验证数据,ii)基于概述统计(位置和变异性估计),iii)的初始探索数据分析,iii)探索和量化数据中的现有相关性,IV)预处理和线性变换数据到其主要成分,v)维数减少和歧管学习,vi)聚类和异常检测,vii)可视化和数据的解释。为了说明所提出的无监督方法,我们使用众所周知的公共基准数据集的合成传输谱。我们表明光谱数据中存在高度的相关性,该数据呼叫适当的低维表示。我们探索了许多不同的技术,用于减少这种维数,在概要统计,主成分等方面确定几种合适的选择。我们在主成分基础上发现有趣的结构,即与不同化学制度相对应的明确定义的分支。底层大气。我们证明,这些分支可以以完全无监督的方式用K-Means聚类算法成功恢复。我们倡导第三个主成分的光谱数据的三维表示,以揭示数据中的现有结构并快速表征行星的化学类。
translated by 谷歌翻译
新发现的外部肌肉的物理特性和大气化学成分通常从其过渡光谱推断出从辐射转移的复杂数模型获得的。或者,简单的分析表达式为相关的大气过程提供了富有洞察力的物理直觉。深入学习的革命已经开辟了直接推导出这样的分析结果的门,直接与拟合数据的计算机算法。作为概念证明,我们成功地证明了在通用热木星外部基因族的过渡半径的合成数据上使用符号回归,以得出相应的分析公式。作为预处理步骤,我们使用尺寸分析来识别变量的相关无量纲组合,并减少独立输入的数量,从而提高了符号回归的性能。尺寸分析还允许我们在数学上得出并适当地参加输入大气参数中最通用的变性家族,这通过过渡光谱影响开发族气氛的表征。
translated by 谷歌翻译
Communication enables agents to cooperate to achieve their goals. Learning when to communicate, i.e., sparse (in time) communication, and whom to message is particularly important when bandwidth is limited. Recent work in learning sparse individualized communication, however, suffers from high variance during training, where decreasing communication comes at the cost of decreased reward, particularly in cooperative tasks. We use the information bottleneck to reframe sparsity as a representation learning problem, which we show naturally enables lossless sparse communication at lower budgets than prior art. In this paper, we propose a method for true lossless sparsity in communication via Information Maximizing Gated Sparse Multi-Agent Communication (IMGS-MAC). Our model uses two individualized regularization objectives, an information maximization autoencoder and sparse communication loss, to create informative and sparse communication. We evaluate the learned communication `language' through direct causal analysis of messages in non-sparse runs to determine the range of lossless sparse budgets, which allow zero-shot sparsity, and the range of sparse budgets that will inquire a reward loss, which is minimized by our learned gating function with few-shot sparsity. To demonstrate the efficacy of our results, we experiment in cooperative multi-agent tasks where communication is essential for success. We evaluate our model with both continuous and discrete messages. We focus our analysis on a variety of ablations to show the effect of message representations, including their properties, and lossless performance of our model.
translated by 谷歌翻译
在本文草案中,我们考虑了安全控制系统安全索引或(控制屏障函数(松散))相对程度等于两个的问题的问题。我们考虑参数仿射非线性动态系统,并假设参数不确定性是统一的,并且已知A-Priori或通过估算器/参数适应定律在线更新。在这种不确定性下,通常的CBF-QP安全控制方法采用了强大的优化问题的形式。不等式约束的右侧和左侧都取决于未知参数。通过给定的不确定性表示,CBF-QP安全控制最终是凸半无限问题的问题。使用两种不同的哲学,一种基于弱二元性,另一个基于无损S生产的哲学,我们得出了此强大的CBF-QP问题的相同的SDP公式。因此,我们表明,可以将具有已知参数不确定性的安全控制的问题提出为可处理的凸问题并在线解决。 (这是正在进行的工作)。
translated by 谷歌翻译
Adversarial Imitation Learning (AIL) is a class of popular state-of-the-art Imitation Learning algorithms commonly used in robotics. In AIL, an artificial adversary's misclassification is used as a reward signal that is optimized by any standard Reinforcement Learning (RL) algorithm. Unlike most RL settings, the reward in AIL is $differentiable$ but current model-free RL algorithms do not make use of this property to train a policy. The reward is AIL is also shaped since it comes from an adversary. We leverage the differentiability property of the shaped AIL reward function and formulate a class of Actor Residual Critic (ARC) RL algorithms. ARC algorithms draw a parallel to the standard Actor-Critic (AC) algorithms in RL literature and uses a residual critic, $C$ function (instead of the standard $Q$ function) to approximate only the discounted future return (excluding the immediate reward). ARC algorithms have similar convergence properties as the standard AC algorithms with the additional advantage that the gradient through the immediate reward is exact. For the discrete (tabular) case with finite states, actions, and known dynamics, we prove that policy iteration with $C$ function converges to an optimal policy. In the continuous case with function approximation and unknown dynamics, we experimentally show that ARC aided AIL outperforms standard AIL in simulated continuous-control and real robotic manipulation tasks. ARC algorithms are simple to implement and can be incorporated into any existing AIL implementation with an AC algorithm. Video and link to code are available at: https://sites.google.com/view/actor-residual-critic.
translated by 谷歌翻译
在强化学习培训的设置代理神经学可以通过分立令牌相互通信,实现作为一个团队有哪些代理将无法独自做到。然而,使用一个热向量作为离散的通信的当前标准从获取作为零次理解通信这样的更理想的方面令牌防止剂。通过嵌入一词从自然语言处理技术的启发,我们提出了神经代理架构,使他们能够通过从了解到,连续的空间衍生离散令牌进行通信。我们显示了在决策理论框架,我们的技术优化通信在大范围的场景,而一个热令牌是唯一最佳的下严格的假设。在自我发挥的实验,我们验证了我们的培训的工作人员学习集群令牌语义有意义的方式,让他们在其他技术无法嘈杂的环境中交流。最后,我们证明这两种,用我们的方法代理可以有效地应对新的人际交往和人类可以理解未标记的应急代理通信,跑赢使用一个热的沟通。
translated by 谷歌翻译